Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Benchmark and system parameters often have a significant impact on performance evaluation, which raises a long-lasting question about which settings we should use. This paper studies the feasibility and benefits of extensive evaluation. A full extensive evaluation, which tests all possible settings, is usually too expensive. This work investigates whether it is possible to sample a subset of the settings and, upon them, generate observations that match those from a full extensive evaluation. Towards this goal, we have explored the incremental sampling approach, which starts by measuring a small subset of random settings, builds a prediction model on these samples using the popular ANOVA approach, adds more samples if the model is not accurate enough, and terminates otherwise. To summarize our findings: 1) Enhancing a research prototype to support extensive evaluation mostly involves changing hard-coded configurations, which does not take much effort. 2) Some systems are highly predictable, which means that they can achieve accurate predictions with a low sampling rate, but some systems are less predictable. 3) We have not found a method that can consistently outperform random sampling + ANOVA. Based on these findings, we provide recommendations to improve artifact predictability and strategies for selecting parameter values during evaluation.more » « less
-
Communities are a common and widely studied structure in networks, typically assum- ing that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sam- pled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noise and bias, which can obscure the community structure and invalidate assumptions underlying standard community detection methods. We propose a general model for a class of network sampling mechanisms based on recording edges via querying nodes, designed to improve community detection for network data col- lected in this fashion. We model edge sampling probabilities as a function of both individual preferences and community parameters, and show community detection can be performed by spectral clustering under this general class of models. We also propose, as a special case of the general framework, a parametric model for directed networks we call the nomination stochastic block model, which allows for meaningful parameter interpretations and can be fitted by the method of moments. In this case, spectral clustering and the method of mo- ments are computationally ecient and come with theoretical guarantees of consistency. We evaluate the proposed model in simulation studies on unweighted and weighted net- works and under misspecified models. The method is applied to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.more » « less
-
Abstract In a complex network, the core component with interesting structures is usually hidden within noninformative connections. The noises and bias introduced by the noninformative component can obscure the salient structure and limit many network modeling procedures’ effectiveness. This paper introduces a novel core–periphery model for the noninformative periphery structure of networks without imposing a specific form of the core. We propose spectral algorithms for core identification for general downstream network analysis tasks under the model. The algorithms enjoy strong performance guarantees and are scalable for large networks. We evaluate the methods by extensive simulation studies demonstrating advantages over multiple traditional core–periphery methods. The methods are also used to extract the core structure from a citation network, which results in a more interpretable hierarchical community detection.more » « less
-
Abstract Linear regression on network-linked observations has been an essential tool in modelling the relationship between response and covariates with additional network structures. Previous methods either lack inference tools or rely on restrictive assumptions on social effects and usually assume that networks are observed without errors. This paper proposes a regression model with non-parametric network effects. The model does not assume that the relational data or network structure is exactly observed and can be provably robust to network perturbations. Asymptotic inference framework is established under a general requirement of the network observational errors, and the robustness of this method is studied in the specific setting when the errors come from random network models. We discover a phase-transition phenomenon of the inference validity concerning the network density when no prior knowledge of the network model is available while also showing a significant improvement achieved by knowing the network model. Simulation studies are conducted to verify these theoretical results and demonstrate the advantage of the proposed method over existing work in terms of accuracy and computational efficiency under different data-generating models. The method is then applied to middle school students' network data to study the effectiveness of educational workshops in reducing school conflicts.more » « less
An official website of the United States government

Full Text Available